Statistical power of gene-set enrichment analysis is a function of gene set correlation structure

نویسنده

  • DAVID M. SWANSON
چکیده

We develop an analytic statistical framework for examining a variety of gene-set enrichment analysis tests. Within this framework, we describe why statistical power for both self-contained and competitive gene set tests is a function of the correlation structure of co-expressed genes, and why this characteristic is undesireable for gene-set analyses. We additionally describe why past gene-set tests have su↵ered from inflated type 1 error, and how permutation-based methods have sought to address the issue with some success in the case of self-contained tests and with less success in the case of competitive tests. While the context of this investigation is microarray analysis, with particular focus on leading tests CAMERA, ROAST, SAFE, and GAGE, the observations are also relevant to recently proposed RNAseq gene-set tests, including MAST. The variable statistical power we describe as a function of gene correlation structure has not been studied. While type 1 error inflation has been well-studied and described previously for both self-contained and competitive tests, it has less often been done in an analytical framework and so it is useful to make assumptions explicit and examine parametric distributions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comprehensive Computational Analysis of Protein Phenotype Changes Due to Plausible Deleterious Variants of Human SPTLC1 Gene

Genetic variations found in the coding and non-coding regions of a gene are known to influence the structure as well as the function of proteins. Serine palmitoyltransferase long chain subunit 1 a member of α-oxoamine synthase family is encoded by SPTLC1 gene which is a subunit of enzyme serine palmitoyltransferase (SPT). Mutations in SPTLC1 have been associated with hereditary sensory and auto...

متن کامل

Genome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis

Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...

متن کامل

بررسی اثرات تغییر بیان ریز آر ان ای های سلولی ناشی از ویروس پاپیلوم انسانی در سلول های سرطانی سنگفرشی سر و گردن در سطح پروفیل بیان ژنی

Background and aim: Human Papilloma Virus plays an important role in some of human malignancies and causes alterations in normal expression levels of cellular microRNAs. In this paper, we evaluated the effects of such changes on Head and Neck Squamous Cell Carcinoma tumor samples at gene expression profile level. Methods: in this descriptive-analytical study, gene expression profiles of 36 tum...

متن کامل

Identifying Gene Set Association Enrichment Using the Coefficient of Intrinsic Dependence

Gene set testing problem has become the focus of microarray data analysis. A gene set is a group of genes that are defined by a priori biological knowledge. Several statistical methods have been proposed to determine whether functional gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to analyzing the dependence s...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017